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1. Introduction 


In a previous article in ‘a folha’ published in 2014”, the use of free open source software in DGT — 
namely the Computer-Assisted Translation tool OmegaT” — was presented in general terms. 


In this article, which is divided in 2 parts, the following Computer-Assisted Translation (CAT) 
applications which have been recently made available by the Directorate-General for Translation 
(DGT) of the European Commission as free open source software (FOSS) are presented: 


> OmegaT in DGT (which we call DGT-OmegaT) with adaptations/improvements/new 
features of DGT’s implementation of OmegaT (standard version 3.6") developed in-house to 
satisfy DGT’s needs (Developer: Thomas Cordonnier) 

> DGT-OmegaT Wizard (Developer: Elio Fedele) 

> Tagwipe (Developers: Elio Fedele and Joao Rosas) 

> Teambase (Developer: Thomas Cordonnier) 


The project is managed by Fons De Vuyst, Head of the Operational Support Sector in DGT’s 
Informatics Unit. 


In the first part of this article, we present a brief description of: a) DGT’s workflow in the context of 
which these applications have been adapted/developed; b) Tagwipe; and c) 10 features adapted/added 
to OmegaT. 


In the second part of this article, which will be published at the end of 2017, we will present the DGT- 
OmegaT Wizard and TeamBase, two applications which still require some work to be “usable” outside 
DGT. 


Concerning the publication of DGT-Omega¥T as open source software, we would like to stress that our 
aim is not to build a new community that "competes" with the team developing and maintaining the 
original OmegaT. DGT publishes the code just to allow anyone to look at the features we have built to 


satisfy our own needs and leave it up to the OmegaT community to decide whether some of them are 
of general interest and can be reused and integrated in it. 


These applications can be downloaded and easily tried/used by anyone in any Java8-compliant 
platform (Windows, Linux, MacOS), as both the source code and the executable version are published, 
but bearing in mind that they are published as-is, without express or implied warranties, and come 
without support. 


2. Open source software in the European Commission 


The European Commission has been promoting the use — and sharing — of open-source software 
namely within its Framework Programmes on Research and Technological Development — 7" FP and 
presently Horizon 2020. Furthermore, the Open Source Software Strategy of the European 
Commission stresses that “the Commission services will increasingly Participate in open source 
software communities to build on the open source building blocks which are used in the Commission's 
software’. 


So it is in this context that open source applications like OmegaT are being used in DGT, alongside 
commercial applications like SDL Trados Studio, the mainstream CAT tool in DGT. 


DGT has been using OmegaT for prototyping since 2012 and for that purpose the 2.6.0._3 version of 
OmegaT was first customized and extended with some useful improvements. 


Some translators liked OmegaT so much that it has been included in DGT’s IT Landscape and for 
that reason DGT-OmegaT Wizard was developed to integrate OmegaT in DGT’s particular workflow. 
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Furthermore, 2 other applications were developed, which can be used both by OmegaT and SDL 
Studio users: 


QO) TagWipe, an application which removes redundant tags inside Office DOCX files. 
OQ) TeamBase, which allows the sharing of project memories among translators who 
can be using either of the CAT tools available in DGT 


Recently, DGT made these applications available as free open source software and they can be 
downloaded here: http://185.13.37.79/. 


3. 


My account Log out 


avieateee About DGT-OmegaT and associated tools 


ae Welcome to the DGT-OmegaT project, a fork of OmegaT, developed by the Directorate-General for 


Translation of the European Commission. 


The aim of this project is not to build a new community that “competes” with the team maintaining the 
eriginal OmegaT. We publish our code to allow anyone to look at the features we built for our own needs, 
and leave it up to OmegaT community to decide whether some of them can be reused. We do not intend to 
discuss encourage users to switch to our version of OmegaT. 


DGT-OmegaT is published as-is, without express or implied warranties, and comes without support. The 
contact page and the comments facility in the website are there to enable a discussion about the actual 
features, not to make feature requests. If you see a bug, please report it using the issues tracker but we do 
not guarantee that eventual problems will be solved. 


Resd more 


Teambase 


Teambase is an alternative way to share work in real time between translators, using a SQL database as in 
intermediate. The main advantage compared to using Git or Subversion is that there is no latency: each segment 
validated by you is immediately available to all users connected to the same database; they do not have to wait 
for the next time you save, automatically or not. 


Binary version of DGT-OmegaT contains the plugin to work as client part. For the server part, go here if you 


want to install it yourself, or here if you want to rent a server already installed. 


Read more 


Tagwipe 


Tag wiper is a perl script which cleans docx files, removing tags which seem totally useless when you open the 


file in Word but can appear when you open the file in a CAT tool. 


Read more 


Wizard 


DGT-OmegaT Wizard is a tool to manage OmegaT projects (create, list, delete...). Internally it makes the link 
between OmegaT and our internal workflow tools. 


DGT Workflow 


DGT’s CAT Environment is explained in detail in the DGT publication Translation tools and 
workflow”. 


Of importance for the purposes of this article is that, in DGT: 


> Translation memories relevant for the particular document(s) to be translated are 


automatically extracted from Euramis, the in-house repository which stores all the alignments 
of translations done in DGT — as well as in some other EU institutions — in the last 2 
decades and most of the EU legislation currently in force in all official languages. 

Retrievals and reference documents are extracted in TMX format and used as external 
memories in either of the CAT tools. In DGT-OmegaT, those memories are automatically 
copied to the \tm subfolder of each project by the DGT-OmegaT Wizard. 

Machine translation is automatically generated for all editable documents to be translated. 
This task is performed by MT@EC ™, the in-house machine translation system based on the 
free open source system Moses “), which is trained with EU corpora and tailored to DGT 
needs. 

The TMX files generated by MT@EC are used as external memories in either of the CAT 
tools. In DGT-OmegaT, those files are automatically copied to the \mt subfolder of each 
project by the DGT-OmegaT Wizard. 

Terminology is stored in IATE (Inter-Active Terminology for Europe) — the EU 
inter-institutional terminology database. 

In DGT-OmegaT projects, the relevant terms for the particular document(s) to be translated 
are extracted in TXT format from an export (source and target terms only) of the whole IATE 
and this “filtered” glossary is automatically copied to the \glossary subfolder of each project 
by the DGT-OmegaT Wizard as a read-only glossary. 


4. Tagwipe 


More often than not Office DOCX documents (the huge majority of DGT documents) have (many) 
useless tags (the “tag soup”). This obviously makes it impossible to use OmegaT without Remove 
tags selected in a number of (unfortunately not so rare) particularly “bad” documents. However, that is 


not the best solution for obvious reasons. 


Many point to possible further improvements in the 
use of the ESIFs and call for NGOs and local 
authorities to be given direct access to funds, for 
better enfotcement of ex ante conditionalities, 
sanctions for failure to uphold the partnership 
principle, better monitoring (through an increased 
role for the Commission and Roma themselves) and 
action to prevent the ineffective use of funds 


(eg. training programmes not leading to 
employment) or their misuse (eg. ESIF 
interventions financing segregated _ settings), 
including through a transparent complaint 
mechanism. 
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Therefore, DGT developed in-house the Tagwipe application which removes all/most redundant tags 
from DOCX documents. It also improves segmentation by segment, something which is very 
important for consistency with segmentation in MT@EC and Euramis, thereby improving fuzzy 
matching with external memories. 


In DGT, Tagwipe is being used since 2012, in a Windows 7 environment, for DGT-OmegaT projects 
and therefore it is quite stable. In the DGT-OmegaT website it is also made available for the Linux 
distribution Ubuntu. 


The cleaning level in Tagwipe can be chosen by the user from level 0 to 8. By default, in the 
installation folder, the cleaning level defined is the second lowest and most conservative level 
("Ievel.1"). 


That is the level used in DGT as translated documents have to be generated with all the original 
formatting. 


However, when that is not the case, the cleaning level can be increased, thereby eliminating non- 
essential formatting as, for instance, colour highlights, non-breaking spaces... or even (almost) all 
formatting when level 8 is selected. 


Tagwipe is automatically used in the creation of translation projects for both CAT tools. In SDL 
Studio projects, tagwiping happens before the conversion of source files to SDLXLIFF format. 


In DGT-OmegaT projects, Tagwipe has been integrated in the DGT-OmegaT Wizard so all DOCX 
source documents are tagwiped when the projects are created or updated. 


In the example below, about 90% of the tags were eliminated by Tagwipe (from 121 tags to 14 tags) 
and segmentation was improved. The remaining tags are meaningful tags. 


Display in OmegaT Editor 


Without Tagwipe 


With Tagwipe 
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<t76/>Many <t77/>point to<t73!> <(79/>possible further 
improvements <t80/ the<t83/> use of<t84> 
<t85/>the <t86/>E<t87/ 
and local authorities<t 
access to funds<{9 
enforcement of <{96/>ex ante <{9//>conditionalities, 
sanctions for <(99/>failure to uphold <(99>the 
partnership principle, better monitoring (through<'100'> 
an<ti01> increased role <1102/>for <{103/>the 
Commission and Roma themselves) <t104/>and 
<1105action to <1106/>prevent<t107/> the ineffective use 
<t108>of funds <t{09/>(e.g.<t1 10/> <t1 41 >training 
programmes not leading to employment) 

<t112>or<t1 13> <t114>their <t115/>misuse <t116>(e.g. 
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settings)<(119>, including through<t/20/> a transparent 
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Other challenges and priorities largely overlap with the 
achievements. 


Stakeholders refer to declining levels of <t0/>political commitment: 
>, with emerging priorities, such as the refugee crisis, shifting 
Member States’ attention away from Roma inclusion. 


As regards reporting by Member States, they call for more 
transparency, further development and closer involvement of civil 


They call for more attention to <(4/>fighting discrimination={5/>, €.9. 
by launching more infringement proceedings and imposing sanctions 
on non-compliant Member States. 


Regret is expressed at the fact that the Commission's Roma<t5/>targ 
eted and mainstream policy guidance under the EU framework and 
Europe 2020 is <t7/>not=t/> <t9/>enforceable=t10/>. 


There are calls to target Roma more explicitly under European and 
national programmes, such as the youth guarantee and Erasmus+. 


Many point to possible further improvements in the use of the <{! (/> 
ESIFs=t12!> and call for NGOs and local authorities to be given 
direct access to funds, for better enforcement of 

<t13/>ex ante <t/4/>conditionalities, sanctions for failure to uphold 
the partnership principle, better monitoring (through an increased 
role for the Commission and Roma themselves) and action to 
prevent the ineffective use of funds (e.g. training programmes not 
leading to employment) or their misuse (e.g. ESIF interventions 
financing segregated settings), including through a transparent 
complaint mechanism. 


5. OmegaT in DGT 


OmegaT is a free open-source CAT Tool that was developed, by private initiative, originally by Keith 
Godfrey in 2000 and that has been vastly improved since then with many contributions. Didier Briel is 
its present project manager. OmegaT is now the leading open source CAT tool. 


OmegaT in DGT (DGT-OmegaT) is a fork of OmegaT 3.1.2 with a few backports from later versions. 
A new version based on OmegaT 3.6.0 and following the "OmegaT standard" branch is also 


available but as a beta development. Both can be downloaded here: 


Q) http://185.13.37.79/ to download the binary version (easy to install) and/or the 


source code and respective documentation 
QO http://185.13.37.79:8003/ for developers and bug reporting 


Despite the differences between DGT-OmegaT and OmegaT, as far as we can see, projects created in 
DGT-OmegaT can be used in OmegaT and vice-versa — as DGT-OmegaT maintains interoperability — 
and both just ignore folders/files/information they don’t “recognise”, although some features will not 


be available, of course. 


In the DGT-OmegaT website detailed technical information is presented about each new or adapted 
feature. In this article we will just summarise/highlight 10 main features, from the point of view of a 
translator, which may be of interest for translators outside DGT. 
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Translation spabsomes by borrehe on rai aul at 15:32:42 Consortium Aisbl) (a seguir «BIC»), que é membro 
Last revised by: borrehe, 26/jan/2017 14: 32:42 , : da Empresa Comum BBI que nao a Unido, continuaa =~ 
Article 12(4) of the Statutes of the BBI Joint Undertaking, set out in the Annex to Regulation (EU) No 560/2014 (the 
Statutes’), states that the financial contribution by the members of the BBI Joint Undertaking other than the Union to Glossary _fo 
operational costs is to be at least EUR 182 500 000 over the period set out in Article 1 of Regulation (EU) No 560/2014, Ts 
that is to say from the establishment of the BBI Joint Undertaking until 31 December 2024. BIC FE 
Oartigo 12.°, n.° 4, dos Estatutos da Empresa Comum BBI, constante do anexo do Regulamento (UE) n.° 560/2014 1. BIC it 
(seguidamente designado «os Estatutos») estabelece que a contribuicao financeira dos membros da Empresa Comum BBI que MIM ec 
nao a Unido para as despesas operacionais deve ascender a, no minimo, 182 500 000 EUR durante o periodo indicado no artigo 1. 2. Acao comunitaria para a criacdo - 
° do Regulamento (UE) n.° 560/2014, ou seja, desde a criacdo da Empresa Comum BBl até 31 de dezembro de 2024. desenvolvimento de Centros de Empresa e Inovacao 
e da respetiva rede 
Translation last modified by borrehe on 26/an/2017 at 16:11:29 3. BIC 
Last revised by: borrehe, 26/an/2017 15:11:29 4. CEE 
The Bio-based Industries Consortium Aisbl (‘BIC’), which is a member of the BBI Joint Undertaking other than the cpus ; 
Union, continues to be ready to support the operational costs of the BB! Joint Undertaking for the amount set out in 6. Centro Empresarial e de Inovacao 
Article 12(4) of the Statutes. 7. Centro Europeu de Empresa e de Inovacao 
<segment 0173 +1 more ** ALREADY TRANSLATED ** > 8. Centro de Empresa € de Inovacao 
1 Consércio de Bioindustrias Aish! (Bio-based Industries Consortium Aisbl @ - Consorcio BIG), que € um membro da Empresa =, 9. COdigo de Identicacao Bancaria 
Comum BBI que nao a Unido, continua disposto a tomar a seu cargo as despesas operacionais da Empresa Comum BBI no ~ || 10. Codigo de Identificacao de Empresa 
montante indicado no artigo 12.°, n.° 4, dos Estatutos. 11. centro de empresa e de inovacao 
<end segment> 
Bio-Based Industries Joint Undertaking = Empresa 
Comum Bioindustrias 
Translation last modified by borrehe on 26/jan/2017 at 16:12:27 MJM - Abreviatura - Empresa Comum BBI 


Last revised by: borrehe, 26/an/2017 15:12:27 bs 
Bio-based Industries Consortium = Consércio 


Fuzzy Matches =13 Bioindustrias 


RTD-2017-80000-02-00-EN-ORI-00_EN-PT-RET.tmx (#1 more) 26-01-2017 21:07 MJM - abreviatura Consorcio BIC 
Match: <100/100%> - Source: <RTD-2017-800000100> - Translator: <machame> 
1) S: The Bio-based Industries Consortium Aisbl (‘BIC’), which is a member of the BBI Joint Undertaking other than the Notes ?o 
Union, continues to be ready to support the operational costs of the BBI Joint Undertaking for the amount set out in 
Article 12(4) of the Statutes. 

T: O Consorcio de Bioindustrias Aisbl (Bio-based@Bio-based Industries Consortium Aisbl @ - Consorcio BIC), que é 
um membro da Empresa Comum BBI que nao a Unido, continua disposto a tomar a seu cargo as despesas operacionais 
da Empresa Comum BBI no montante indicado no artigo 12.°, n.° 4, dos Estatutos. 


Comments || Dictionary || Multiple Translations 
Project autosaved on 13:56 235/235 (216/216, 470) | [260/291 


5.1. DGT Toolbar 


A plugin was developed in-house for DGT-OmegaT that produces a toolbar with icons — equivalent to 
menus — which give quick access to the more frequently used functions, including to features which 
are specific to DGT-OmegaT, namely: DGT applications (DocFinder, Quest, Euramis and IATE), 
Revision Mode and View Other Target Languages. 


| Project Edit GoTo View Tools Options Help 


Beale Adavyl<-— 


In the FOSS version of DGT-OmegaT, the DocFinder, Euramis and Quest icons are greyed as these 
databases cannot be accessed outside DGT. The link to IATE is active and gives access to its publicly 


available version”. 


Although Euramis is not available outside EU institutions, most of the EU legislation has been made 
freely available on the Internet in aligned files". 


5.2. View Source and Target Files 


While working on a translation, the source file of the active document in the Editor can be opened 
directly from DGT-OmegaT by selecting View source file in the Project menu. This will launch the 
native application associated with the source document's file type and displays the source content. 


There is no (real-time) preview of the translated document in OmegaT, but the completely or partially 
translated file that is active in the Editor can be opened from DGT-Omega¥ by selecting View target 
file in the Project menu. This will launch the native application associated with the source document's 
file type and displays the translated content. But there is no interaction between DGT-OmegaT and the 
native application, so no changes should be made to the target file in the native application as they will 
not be returned to OmegaT! 


5.3. Machine Translation 


DGT doesn’t use any of the public MT systems available for several reasons, notably confidentiality 
and copyright. So a Local option has been developed for DGT-OmegaT, adding another 
implementation of Machine Translation. Instead of calling a server in real-time, this Local MT 
implementation reads data from one or more TMX files. In DGT, only the Local MT engine is 
available. 
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By creating a new project subfolder named \mt and copying the MT file(s) there — something which in 
DGT is automatically done by the Wizard when creating/updating a project — MT output is displayed 
in the Machine Translation pane, not in the Fuzzy Matches pane. 


Both with and without the plugin, the Editor Behaviour Options window now allows for the 
automatic insertion of MT if there is no Fuzzy Match above the defined minimal similarity threshold 
from the project memory or from the external memories. 
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5.4. Colours and identification of suggested text in the open segment in the Editor 


As OmegaT implements a lot of automatic insertion mechanisms, when a new segment is opened it is 
very important to know if the suggested translation in the editing zone was already in the project 
memory or if it has been automatically inserted, but not validated. OmegaT only adds a background 
colour if the segment comes from TMX files in the tm/auto or tm/mt subfolders. DGT-OmegaT adds 
colours in some other cases and also displays more information in the starting segment delimiter. 


Empty segment Translated segment Auto-populated segment 
36 Translation last modified by cardoth on 16-mai-2017 past mooted by uoknowe _ 
ae 7 oO ese challen: S require huge 
net oe tere Ere cocgutent 0001“ ALREADY TRANSLATED “™ > <segment 0058“ TM/AUTO “> 
4 Todes estes desafios exigem enormes 
<end segment> <end segment> <end segment> 
100% match Partial match Machine Translation 
GENERAL PROVISIONS Furthermore, the Commission will: See Communication 2014/C 188/02. 
<segment 0426Match 100/100/100%> <segment 0063 +4 more Match 100/75/66% > <segment 0179 MT > 
DISROSICOES GERAIS peeererenn Alem disso, a Comissao| Mer Comunicacao 2014/C 188/02. 
<end segment> <end segment> 


5.5. Search windows: Search Project, Search Directory, Search and Replace and Search 
and Pre-translate 


The OmegaT search feature is already very sophisticated. However, in DGT-OmegaT, the single 
Search window of OmegaT has been replaced by a set of four windows and associated menus which 
still look similar to the public OmegaT, but the code has been completely rewritten. 


Thanks to this division, the windows are less saturated... and more options could be added. These 
features are really worth exploring! 


[S| mobile termination rates - OmegaT 


‘Text to search 
Y) In source [/] NOT |mobile termination rates 


VY] In translation [7] NOT 
WV] Innotes [—]NOT 


For the project 
For all projects 


Cancel 


{-] Same text in all fields AND @ OR 


Expression mode 
© Exact search Keyword search Regular expressions Case sensitive | 
[ere options 


[[ Remove Duplicate segments Translated Untranslated @) Translated or untranslated 


‘Search scope 


Source files [¥]Memory [¥]TMs [¥] Glossaries [WJ] File or folder name: | 1-MAIN-REF\32015R2352-EN-PT-AL.tmx > [_ Memorize _ | 


‘Match display template 
= = = = a) 
Template variables: ${preamble} Inse Reset Cance! Configure format | 
Number of matching segments: 1.000 |= 
(¥] Author: [-] NOT ~ | Memorize | [7] Translator: NOT |machame Memorize 
(© Changed after: 1/9/14 4:31PM 10. [© Changed before: 1/9/14 4:31PM 


of matching segments: 7 {Search |{ Advanced Options _| 


TM:1-MAIN-REF\32015R2352-EN-PT-AL.tmx>) - 01-01-2015 2:00 - Source: <-2015-32015R2352> - Translator. <> - Created by: <ALIGNI> - Revisor: 
-> ORI: (4) Regulation (EU) No 531/2012 limits any surcharge applied for receiving regulated roaming calls to the weighted average of maximum mobile termination rates 


across the Union. 
-> TRA: (4) O Regulamento (UE) n.o 531/2012 limita todos os encargos adicionais aplicados a rececao de chamadas de itinerancia regulamentadas a média ponderada 


das taxas maximas de terminacao mével em toda a Unido. 
Note; 
[=] Auto-sync with Editor [F]] Back to the initial segment on close [Close || Remove Fitter |[ Filter _ | 


Mm) > 


Some of the new options in the Search Project window are: 


> Expression mode and Word mode which, combined, give a wider range of options and results. 
> Word mode. Search options in OmegaT are string-based: when searching for "test", if the segment 
contains "protestation", it will be found. DGT-OmegaT adds two alternatives: 

e Whole words: As in many other edition tools, the previous sample will be rejected unless the 
search explicitly accepts characters before and after by using the wildcard «?» (one character) 
or «*» (0 or more characters) 

e Lemmas: In this mode, the screen will use tokenizers exactly as for the Fuzzy Matches pane: 
grammatical inflexions of a word will be recognized, while words "containing" another one 
will not. However, this is still a partial search (i.e. searching for a segment containing the 
given words) without calculation of a score, while for the Fuzzy Matches pane a full search is 
made and a score calculated. 


Search in has been expanded allowing to easily use the Booleans OR, AND or/and NOT, a very 


useful feature for translators for terminology purposes. In particular it allows to easily check 
whether a term/expression has been consistently translated both in the project and in the external 


memories. 


subfolder with several TMX files in the \tm folder. 


Search by file or folder name, allowing to limit the search to one memory (TMX) file or to a 


Memorize: Unlike in OmegaT, memorization of a search is done via a button (to be able to 


selectively memorize only relevant searches) and there are options to memorize the search for any 
project, the current project or only that session. Furthermore some frequently used Regular 
Expressions are memorized by default. 


format option. 


Match display template: Display is configurable (as for Fuzzy Matches) using the Config 


The Search and Replace window follows the same approach as the Search Project window. 


|S} faixas - OmegaT 


a7 su 


=| 


Search for: faoas 


Replacement : bandas 


¥ | Memorize 
| Memorize 
Word mode 


Expression mode 


(@) Exact search Regular expressions 


© Case sensitive Lemmas 


Strings @) Whole words 


| 


Replacement mode: 
(@) Replace found text 


Replace entire words | Replace entire segment __ Use regular expression variables 


Match display template 
Template variables: ${preamble 


Number of matching segments: 
Author: NOT 
|) Changed after: 
Nr of matching segments: 12 


232> 
\ 


Migragao de outras aplicagdes PMSE audio sem fios para outras tecnologias elou faixas. 
Migracao de outras aplicacdes PMSE audio sem fios para outras tecnologias e/ou bandas. 


1,000 

| Memorize Translator: __ NOT 
Changed before: 1/14 16:31 
Gee Adena nine 
— a| 
| 
] 


Confirm X 


No 


Yes 


1?) You are going to modify 12 segments. Do you want to continue? =\ 


It has been adapted to 
include a result pane 
allowing to first check 
the segments that will 
be affected by the 
Replace _ operation. 
This is very useful as 
there is no Undo 
option! 


The Search and Pre-translate window is specific to DGT-OmegaT. Its role is to quickly fill 
segments for which the translation is either trivial (for example, numbers and other non-translatable 
characters) or is best done in a batch operation. It works in a similar way as Search and Replace, but 
it does the search in the source segment. So it can be used for untranslated segments. 


While the Search and Replace window only allows to translate with fixed text, Pre-Translation is 
possible by filling the translation with: Source: copying source to target; Match: pre-translating from 
external memories matches (match threshold can be defined) and Text: as for Search and Replace, a 
constant string with the possibility to use variables recognized during the search. 


|S) *[*a-zA-Z]*$ - OmegaT 


| x 


[Search for:|“[~a-zA-Z]*$ 
Translate as : (@) Source 
>) Text 


Match Minimal score: 


~ | Memorize 
Machine translation Prefix: 


vl m z Use regular expression variables 


Expression mode 


>) Exact search Keyword search 


(en expression mode 


©) Regular expressions "| Case sensitive Partial segment Whole words @) Full segment 


Search in... 
Translated (@) Untranslated 


Translated or untranslated 


Match display templ: 
Template variables: 


Configure format | 


Number of matching segments: 
VY] Author: [7] NoT 

Changed after: 
Ir of matching segments: 5,669 
50> 
2501 00 


7,000 |S 


~ | Memorize |_| Translator: NOT ~{ Memorize 


09/01/14 16:31 Changed before: 09/01/14 16:31 ' 


Search || Advanced Options 


Confirm 


(2) You are going to modify 5,669 segments. Do you want to continue? 


No } 


Yes 


This is an example of 
its use with Regular 
Expressions to 
pre-translate segments 
without alphabetical 
characters (mainly 
numbers). 


The Search Directory follows a similar approach, with adaptations for its particular purpose. 


5.6. Glossary 


The Glossary pane uses colours to differentiate source term, target term and 3rd field (see screenshot 
in point 5) and the last field is displayed together with the respective target term. Furthermore, the 
entries after the writable glossary entries are displayed in alphabetical order making it more 
user-friendly. 


Cosy lee The submenu Glossary - Glossary Pane 

= and Transtips configuration gives the 

translator some important options 

concerning terminology display and 

synchronizes the word search algorithms in 

order to have consistent results in both 
displays. 


@ (/ | Underline glossary entries in translation 

{¥| Suggest translations [V| Remove tags (glossary text area only) 

Expression mode 
») Exact search @) Keyword search Case sensitive 


5.7. Match statistics (per file) as a background operation 


Match Statistics and Match Statistics per File are done in the background thereby allowing the 
translator to start translating while those statistics are being generated. This feature is especially 
interesting for large projects of hundreds/thousands of pages with large external translation memories 
as generating Match Statistics (per File) in those projects takes quite some time. 


5.8. Revision mode 


In DGT, translators can also act as revisers and generally the translator has the last word, which means 
that, after a document is revised by a fellow translator acting as a reviser, the translator will check the 
changes made and can accept or reject them. Bearing that in mind, a revision workflow was developed 
in the DGT-OmegaT Wizard and the following features were developed in DGT-OmegaT. 


In DGT-OmegaT, there are now 2 modes - Translation Mode and Revision Mode — and segments 
have 3 statuses: untranslated, translated and revised. 

"Editor - ENV-2017-8000 1-00-00-PT-TRA-00,00CX oO 
| 


| Translation last modified by borrehe on 31/jan/2017 at 14:37:15 
| Last revised by: borrehe, 31/jan/2017 13:37:15 


m.| 


| Translation last modified by borrehe on 31/jan/2017 at 14:37:51 

| Last revised by: borrehe, 31/jan/2017 13:37:51 

W<t0/>here finding substitutes was relatively straightforward, this has already occurred and substitutes are being used. 
| <segment 0154 ** ALREADY TRANSLATED ** > 

| Nos ={0/>casos em que foi relativamente facil encontrar substitutos, procedeu-se ja a substituicdo por esses substitutos. 

| <end segment> 


Fuzzy Matches G 


| auto\draft\project_save.tmx 30-01-2017 13:44 = 
| Match: <100/100%> - Source: <--> - Translator: <> Revisor: <> = 
| 1) S: W<t0/>here finding substitutes was relatively straightforward, this has already occurred and substitutes are being used. 


| T: Nos <t0/>casos em que foi relativamente facil encontrar substitutos, procedeu-se ja a substituicdo por esses sucedaneossubstitutos. 


When the reviser selects the Revision Mode &: 


> Segments opened and in any way saved in the Editor — either changed or not — are marked with 
Last revised by and the login of the reviser. This information is displayed for those segments in the 
Editor and, if so configured, in the Fuzzy Matches and Search windows. This mark cannot be 
deleted... but an export can be made without that mark. 

> By having the unrevised project memory copied to the \tm folder (identified as Draft, for instance), 
track changes are displayed in target as long as the new option View diff in target is activated in 
the External TMX options. 

> There is the new option Go to Next Unrevised Segment with the shortcut Ctrl+U (the same used 
in the Translation Mode to Go to Next Untranslated Segment). 


When the translator receives the revised project to finalize, (s)he can easily and quickly check the 
segments changed by the reviser with these 2 new options: 


e Mark Revised and Changed Segments in the View menu, whereby the segments changed by 
the reviser are displayed with a red background. 

e Go To Next/Previous Revised & Changed Segment (and its shortcuts) thereby successively 
opening only the changed segments for checking. 


The Revision Mode can also be used by the translator when a particular project is not revised by a 
colleague, in which case the translator is his/her own reviser as happens in a substantial number of 


documents. 


5.9. Pseudo-tags: insertion of tags in target not present in source 


In DGT-OmegaT, there is also an option to add formatting to the target which does not exist in the 
source segment using the feature Format (pseudo-tags). 


EESOGADGSSES RARE @ Sieeaie 41> comcitac mac nantes INTERESSADAS E DAS Highlighting the text to be 
AVALIACOGES DE IMPACTO Ss : : : 
<end segment> cop) formatted, right-clicking the 
Paste . 
Rage mouse and_ selecting the 
Translation last modified by machame on 4/ago/2 3 “ 7 ‘ 
Last revised by: borrehe, 8/ago/2017 22:23:36 Sao Banton «+ | desired format (italics , bold, 
Remove translation t * . 
ee coer underline, superscript and 
Se ia ee eee subscript) inserts characters 
09-RETRIEVALS\RTD-2017-80043-00-00-EN-oF = *°"™* q) | Settee (eseudo tach: x POSTS which exist in Unicode but are 
Match: <100/100%> - Source: <TAXUD-2017-8C insert Unicode Control Character | Set Dold (Pseudo-tag):@EX POST@ | 
4) S: RESULTS OF EX-POST EVALUATIONS, STAKEHOLDER CONSUL TATIC StS esl rarely used. 


ASSESSMENTS Set superscript (pseudo-tag):@EX POST@ 
T: RESULTADOS DAS AVALIACOES EX POST@EX POST@, DAS CONSUL” Set subscript (pseudo-tag): EX POST@ 


When the translated documents are generated, the Reformatter script transforms those pseudo-tags in 
the corresponding target format. However, pseudo-tags should be used sparingly as sometimes other 
formatting in the segment may be affected. It is not perfect (and it works only for DOCX files, for the 
moment), but it can be very useful! 


5.10. View other target languages (Cross-lingual concordance) 


A substantial part of EU documents is translated simultaneously into many/all EU official languages. 
So for the purpose of multilingual consistency, it is useful that translators can have a look at ongoing 
translations into other languages. Omega-T already has a good feature for that purpose: the 
tmx2source folder — but the problem is that files have to be requested and copied manually to it. 


For that purpose, the installation of SDL Studio in DGT has the following adaptation: each time a 
translation is saved (as ongoing, not final!) the SDLXLIFF file is copied to a common folder. At any 
time, the translator of the same document in another language can use a contextual menu (plugin 
inside Studio) which will convert this SDLXLIFF file into HTML and display it in the browser. 


In DGT-OmegaT, View Other Target Languages — — combines this with other options. 
First it works in both directions - OmegaT users can see Studio documents and vice versa. Second, 
DGT-OmegaT users can choose to have them automatically converted to TMX (not to HTML) and 
copied to the tmx2source folder (for display in the Editor) and/or to the tm/penalty-50 folder (for 
display in the Fuzzy Matches pane). 


Editor - JUST-2017-80028-0 1-00-PT-TRA-00.DOCX 


improved multicultural competences 

FR: /amélioration des compétences multiculturelles; 

IT: i mighoramento delle competenze multiculturali; 

<segment 0163 ** ALREADY TRANSLATED ** > 
competéncias multiculturais 

<end segment> 


Fuzzy Matches 


penalty-50\FR.tmx 
Match: <50/50%> - Source: <--> - Translator: <> 
1) S: improved multicultural competences 
T: famélioration des compétences multiculturell 


To be continued in Part 2 ... 
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